SGML is a metalanguage suitable for describing all kinds of markup languages, including HTML. It is also an International Standards Organization standard (ISO 8879) for specifying, defining, and creating documents that are independent of platform and display differences that are irrelevant to the delivery and rendering of those documents' contents. In other words, SGML is a language for formally defining documents types (or classes or kinds of documents) and document instances (where a kind of document is implemented with its own unique content:.) For instance, a Creole cookbook and an Italian cookbook are both obviously cookbooks (type), but each contains vastly different recipes (instances).
Print Resources
SGML is an international standard used for the formal definition of device-, system-, and application-independent electronic text. In other words, SGML is a metalanguage--a language used to describe other languages--that formally defines a descriptive markup language. A descriptive markup language is one that uses explicit markup, called tags, to describe what a document's structure and function are, rather than forcing that document to behave in certain ways (as is the case with most word processors, layout programs, and other software used to create documents for everyday use).
The power of SGML comes from its structure-driven approach to describing the contents of a document. In fact, SGML identifies each part of a document by its purpose and role. SGML does not describe a document's appearance; it leaves presentation of documents to browsers and print-formatting applications.
SGML originated with work begun at IBM in the 1960s to overcome the problems inherent in moving documents among multiple hardware platforms and operating systems. IBM's efforts were called GML, for General Markup Language. GML was originally targeted for internal use at IBM rather than as a generic way of representing documents. This was the first publish-once, multiplatform strategy for document preparation--a strategy that's become essential for managing large, complex publication environments today.
GML's originators--Charles Goldfarb, Ed Mosher, and Ray Lorie (the original "GML")--realized by the early 1970s that a more general form of markup would make documents portable from any one system to any other. The work led in the 1980s to the definition and birth of SGML , which is governed today by the ISO8879 standard, and used around the world.
SGML is a powerful and complex tool for representing documents of all kinds. It offers the ability to create specifications for many types of documents, that can then be used to define and build individual document instances conforming to those specifications.
A large variety of government agencies, vendor consortia, and industry organizations have adopted SGML. The Department of Defense (DoD), for instance, mandates that all documentation be submitted in a format that complies with the CALS standard MIL-M-28001B. CALS (Continuous Acquisition and Life-cycle Support) is a DoD initiative to promote electronic document interchange between itself and its many contractors and subcontractors. MIL-M-28001B specifies formal Document Type Definitions (DTDs) for technical manuals in the required format. This lets Yoyodyne, Inc. develop its documentation on a Linux system using troff, but guarantees that the procurement clerk in the Pentagon who's running a Sun workstation using an ArborText system will be able to read, print, modify, and abstract its contents without going through a lot of contortions.
The guiding principal behind SGML is based on a concept called
"markup"--namely, that text and word processing
systems typically require additional information to be included
with the natural text that makes up the content of a document.
This added information serves two basic functions:
The use of "generalized markup" is what makes SGML's document definition capabilities so all-encompassing and powerful. A quote from Charles Goldfarb's classic tome on the subject, The SGML Handbook, explains this concept wonderfully (pp 7-8):
generalized markup" ...does not restrict documents to a single applications formatting style, or processing system. Generalized markup is based on two novel postulates:
- Markup should describe a document's structure and other attributes rather than specify processing to be performed on it, as descriptive markup need be done only once and will suffice for all future processing.
- Markup should be rigorous so that the techniques available for processing rigorously-defined objects such as programs and databases can be used for processing documents as well.
The key ideas here are that markup needs to be applied only once, and can drive multiple forms of output ("all future processing"), and that markup be rigorous enough to support computerized parsing, data manipulation, and programmatic transformations or output. SGML deliver these capabilities in spades.
The heart of an SGML-based document is its governing Document Type Definition, or DTD. A DTD lays out the structural elements and markup definitions for a document, which can then be used to create actual document instances (which supply content that's organized within the framework of the DTD). This is why SGML is often described as a form of "descriptive markup"-- meaning that it describes the elements in and organization of a document thoroughly, without necessarily addressing how such elements are presented. Most typical word processors use "procedural markup" which intermixes presentation, structure, and organization along with content.
The inability to separate presentation description from structure and organization is what makes word processor files so dependent on their particular applications--without the program that "understands" these peculiar formats, such files are largely unintelligible. SGML documents, on the other hand, can be read and "understood" by any system that can handle general SGML parsing, so long as the document's governing DTD is available to provide definitions for the document's structural and organizational elements. This also means that specific "output DTDs" can be created, to permit the same document to be presented in a variety of ways, tailored for specific media. Output DTDs let the same document instance be rendered differently (and customized completely) for hard copy output, CD-ROM delivery, and presentation via the World-Wide Web, among other uses. In a very small nutshell, this capability explains why so many organizations are moving to adopt SGML as the document description technology driving their publication processes.
Here's a short list of some of the agencies, consortia, and organizations
that have produced standard DTDs:
Because SGML fosters a platform- and software-independent means of defining and exchanging documentation, it has become a representational tool of choice whenever multiple partners must exchange documents (especially large, complex ones). But because SGML also supports rigorous document definitions and descriptions, it is becoming a preferred tool for purely in-house publishing needs. Not only does SGML support computerized validation (to make sure a document conforms entirely to its governing DTD), it also permits the same document to be used to create a variety of forms of output. Since it's so much easier to maintain only one version of information than to try to synchronize multiple (and incompatible) versions, the move toward SGML for corporate publishing environments is gaining considerable momentum.